Skip to content

FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224

Open
kostandinang wants to merge 47 commits into
ka/fe-878-brunch-servefrom
ka/fe-864-pi-timeout-600s
Open

FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224
kostandinang wants to merge 47 commits into
ka/fe-878-brunch-servefrom
ka/fe-864-pi-timeout-600s

Conversation

@kostandinang

@kostandinang kostandinang commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Stack Context

Stacks on FE-878 (#222). FE-883 — operational improvements under the FE-864 brownfield-orchestration umbrella (planning PR #212). No longer the single timeout tweak it started as — this branch collects the operational improvements that make brunch serve usable on real brownfield runs.

What?

  • Cook agent action timeout → 600s — long real-repo actions no longer abort prematurely.
  • Orchestration on Opus 4.8 — interviewer, cook agents, and the architect default (DEFAULT_ARCHITECT_MODEL_ID) all run claude-opus-4-8; the interviewer uses adaptive thinking + effort, and a fix stops 4.8 fragmenting ask_question into failed tool parts.
  • Slice grid fidelity — fail rows when the writer pi aborts (including verify-epic), keep the grid current during verification, surface the failure reason + a pinned halt summary, show per-slice attempt count, and a "not-started" evaluate gate so a greenfield slice isn't painted ✗ before its tests exist.
  • serve --land — merge the promoted cook branch into the active branch as the final serve step; greenfield runs (no repo branch to land onto) warn instead of silently ignoring the flag.
  • Run-artifact composer — a git-merge-based run artifact (run-artifact.ts / run-refs.ts) composes the completed run.
  • Epic-merge dep resolution — the merged __epic__ tree re-links the parent's symlinked node_modules so verify-epic and probes have resolvable deps.
  • Dev-server quiet — stop the dev server reloading on .brunch cook worktrees.
  • Presentation polish — recipe banner lexicon, exact out-dir reporting, verbose warnings.

Why?

The spec-23 run exposed serve reliability as a broad issue, not a single timeout: actions timed out on real repos, the architect's model default produced weaker fallback plans that amplified slice counts, and slice failures — especially writer and epic-verify aborts — vanished from the grid with no reason. This branch is the FE-864 umbrella's implementation seam for those concrete operational fixes.

Tests

Per-seam unit coverage for the changes above: serve --land arg mapping + greenfield warning, verify-epic failed-row emission, the epic-merge node_modules re-link, the architect default model id, the run-artifact composer, and the slice-grid presenter.

@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches brownfield git promotion, optional merge onto the user’s active branch, and widespread ref renames—behavior changes are mostly covered by tests but real-repo edge cases (conflicts, stale worktrees) warrant careful manual verification.

Overview
This PR broadens FE-864 from a timeout tweak into orchestration and UX changes aimed at real brownfield brunch serve runs.

Cook execution and sandboxes: Per-slice git worktrees are provisioned lazily on first use (ensureSliceWorktree) instead of seeding every slice up front. node_modules is symlinked from the parent worktree (and re-linked after epic merges) rather than CoW-copied per slice. Worktree setup runs git worktree prune before adds. The pi agent timeout rises to 600s; cook rejects unknown CLI flags and throws on plan/Petrinaut errors so the Ink presenter can unmount cleanly.

Git artifacts: Run/slice refs move from cook/… / cook-slice/… to centralized brunch/run/<id> and brunch/slice/<id>/<slice> via brunchRef. landCookBranch optionally merges a promoted run branch onto the active branch ( serve --landlandBranch; refuses dirty/detached HEAD, aborts on conflict). New run-artifact (harvestCookRun, merge-tree fold) adds per-slice git history with fail-closed conflicts (tests only in this diff—not yet wired into the main promote path).

Live CLI: The event bus gains run-shape, slice, and cook-done. The Ink TUI shows an epic/slice grid (steps, retries, failure reasons, halt summary), tool-call heartbeats (instrumented pi tools + latest-line progress), a figlet wordmark in scrollback via <Static>, and brigade serve when a run completes successfully. evaluate-done treats “no test files” as absent so greenfield gates don’t paint phantom failures; runner output strips [agent-tail] noise.

Defaults and misc: Architect/cook models default to Opus (claude-opus-4-8 / docs claude-opus-4-6). Vite dev ignores .brunch/**. Chat API accepts follow-up history with output-error tool parts.

Reviewed by Cursor Bugbot for commit 21e1653. Bugbot is set up for automated code reviews on this repo. Configure here.

@kostandinang kostandinang changed the title FE-864: raise pi action timeout to 600s FE-864: Raise pi-action timeout to 600s Jun 16, 2026
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 90bb5ef to 40a9d88 Compare June 16, 2026 18:05
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 5b145b2 to 7d39fe7 Compare June 16, 2026 18:05
@kostandinang kostandinang changed the title FE-864: Raise pi-action timeout to 600s FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec Jun 16, 2026
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from a5bfc10 to ac4e47c Compare June 16, 2026 23:45
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 40a9d88 to 05b471a Compare June 16, 2026 23:45
Comment thread src/orchestrator/src/pi-actions.ts Outdated
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 05b471a to 89e7850 Compare June 16, 2026 23:55
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from ac4e47c to 3e02bfb Compare June 16, 2026 23:55
Comment thread src/orchestrator/src/net-compiler.ts
Comment thread src/orchestrator/src/presenter/run-store.ts
@kostandinang kostandinang changed the title FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec FE-864: Orchestrator improvements Jun 17, 2026
kostandinang and others added 6 commits June 17, 2026 09:36
Each cook agent action (write-tests, write-code, verify-epic) runs under a
per-action wall-clock budget enforced in pi-actions.ts. Raise the default
from 300s to 600s so Sonnet agents have headroom on larger slices and on
brownfield repos where setup/discovery eats into the turn.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…eam, clean failures

Iterating on the live TUI from real-terminal feedback:

- One global run timer in the footer instead of a per-item clock on every
  pending row (and whole-second, no jittery decimals).
- "brunch" wordmark is now a big lowercase figlet (Slant) in a warm orange
  gradient, replacing the egg.
- Activity log + wordmark stream through Ink <Static> so the full run lands in
  scrollback instead of collapsing in a redrawn bounded box; line cap removed.
- Brigade tracker no longer lights "taste" mid-cook — per-slice verify actions
  fire during cooking, so taste stays unlit until a real end-of-cook signal.
- Failures throw instead of process.exit, so withCookBus disposes (unmounts
  Ink) before the error prints — no more frozen "prep ◐" hang. cook validates
  args before mounting the TUI and rejects unknown flags (e.g. --spec-id).

check + presenter/cook/pi-actions tests green; full build deferred (active
graphite stack navigation).

Co-Authored-By: Claude <noreply@anthropic.com>
Wires the two remaining kitchen-brigade phases faithfully to the
orchestrator-arcs mapping (verify→taste, ship→serve):

- taste lights on the epic-verification verdict (action `epic <id> → …`), not
  on per-slice `verify <target>` lines — those fire mid-cook and previously lit
  taste while still cooking.
- serve lights on a new `cook-done` event emitted at the end of runCook (after
  promotion); a halted run never ships, so it never lights serve.

phase.test covers both signals + the full prep→cook→taste→plate→serve walk;
check + presenter/cook tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Each cook pi session was a black box in the pending panel — just a KB count.
runPi already subscribes to the session's text stream; instead of bytes,
surface the agent's latest non-empty line (tail-truncated, throttled every
2 KB) as the activity-progress detail, so a wait reads as live work ("agent
writing tests · …adds the RefreshToken guard") rather than "still going".

Kept headless createAgentSession — no pi InteractiveMode, no new pi API: pi's
tool-call events come via an extension hook (on('tool_call')), not the
subscribe stream, so a richer "editing <file> / running <tool>" heartbeat is a
separate follow-up that needs the extension-registration path verified.

check + pi-actions/presenter tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Richer "what the agent is doing" in the pending panel (the spike's Option A,
full tier): instead of only the agent's latest line, show the tool calls —
"edit src/auth/token.ts", "bash bun test", "grep RefreshToken".

pi exposes no tool-call hook on session.subscribe (text/lifecycle only), so
buildSessionOptions now supplies the built-in tools itself via customTools +
noTools:'builtin': each createXToolDefinition(cwd) is wrapped to emit a label
from its params, then delegates unchanged. The builders bake in the real
config (withFileMutationQueue, truncation defaults), so behavior is preserved
— confirmed in pi's edit.js. Observation is fail-safe (emit in try/catch).

toolLabel + instrumentToolDefinition are pure/unit-tested (label mapping;
wrap delegates same args + result; observer error can't break a tool call).

Caveat: the customTools/noTools runtime wiring isn't covered by tests (they
stub createSession, bypassing buildSessionOptions) — needs a real cook run to
confirm the agent receives the instrumented tools and they emit live.

check + pi-actions tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Brownfield cook provisioned every slice's git worktree eagerly in
wireHandlers — N `git worktree add` + N recursive node_modules CoW copies
paid synchronously at startup before any slice fired.

- Move slice-worktree creation into resolveSliceCwd via idempotent
  ensureSliceWorktree, so a slice's worktree is materialized on first fire.
  A run touching 2 of 8 slices pays for 2 worktrees, not 8. Synchronous
  provisioning serializes concurrent fires on the JS thread, so parallel-policy
  worktree adds never overlap.
- Symlink each slice's node_modules to the parent worktree's single copy
  instead of CoW-copying per slice (SHAREABLE_TOP_LEVEL_ENTRIES). walkFiles
  already skips symlinks, so the shared tree is never re-walked during dep
  seeding, merge, or promotion. Other gitignored dirs still copy per slice.

Correctness-neutral: same worktrees/branches, just lazy; deps resolve through
the symlink. npm run verify green; adds symlink + idempotency unit tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread src/orchestrator/src/pi-actions.test.ts
Comment thread src/canvas/edgeStyles.ts Outdated
Comment thread src/orchestrator/src/pi-actions.ts
kostandinang and others added 4 commits June 17, 2026 12:19
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
`brunch serve --land` opt-in: after a brownfield cook promotes to
cook/<runId>, merge that branch into the repo's checked-out branch as
serve's final step. landCookBranch refuses on a dirty tree / detached
HEAD and aborts on conflict — the cook branch is always left intact.
Plain `cook` and default `serve` are unchanged, so FE-877's "never
freelance into the working branch" invariant holds unless opted in.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The interviewer (interview.ts) and the three cook agent actions
(write-tests / write-code / verify-epic in pi-actions.ts) move from
claude-opus-4-6 to claude-opus-4-8. Secondary-chat and the plan-architect
default are left on 4.6 intentionally.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 363cba6 to 91a281d Compare June 17, 2026 11:59
Comment thread src/orchestrator/src/cook-cli.ts
Comment thread src/orchestrator/src/net-compiler.ts
kostandinang and others added 3 commits June 17, 2026 13:04
Opus 4.8 rejects thinking.type=enabled ("Use thinking.type.adaptive and
output_config.effort"). Swap the interviewer's enabled/budgetTokens block
for { type: 'adaptive' } + effort: 'medium'. Only interview.ts sets explicit
thinking provider options; secondary-chat and plan-architect don't.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The grid made the happy path legible; this does the unhappy path.

- events: slice + cook-done gain `reason?`.
- pi-actions: failed-slice emits carry a reason — evaluate-done maps failureKind
  ('infra error' / 'tests failed'); write-tests/write-code catches use
  'test authoring failed' / 'code authoring failed'.
- cook-cli: cook-done carries result.reason on halt.
- run-store: slice rows store `reason`; RunState.haltReason set from
  cook-done(ok:false), unset on completion.
- ink: a failed slice row shows its reason (red); a HaltSummary footer pins
  "✗ halted · <reason>" (truncated) when the run halts.

Retry-attempt counts ("3/3") DEFERRED — retryCount lives on the petri-net
tokens, not the action context; that's a separate slice.

Verified in an isolated worktree off the TUI branch: check clean; presenter +
pi-actions + cook tests green (141). The one full-suite failure is
build-boundary.test (client production-chunk manifest) — unrelated to this
server-only change; a worktree/symlinked-node_modules artifact, re-verify on a
normal checkout.

Co-Authored-By: Claude <noreply@anthropic.com>
Completes the retry half of failure legibility — without net-token plumbing.
A slice loops failed→running on retry, so the attempt count is derivable from
the slice-event stream the store already folds: queued→running is attempt 1, a
step change mid-run keeps it, failed→running bumps it. The grid shows "attempt N"
only for N≥2 (no clutter on the first run).

No engine/net changes: retryCount on the petri tokens isn't needed — the
observable slice transitions carry the same information.

check + presenter tests green (46).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread src/orchestrator/src/app-probe.test.ts
The attempt count now reads "attempt 2/4" — n over the total attempts allowed
(retry budget + 1). cook-cli emits maxRetries on run-shape; the store derives
maxAttempts = maxRetries + 1; the grid shows it only once a slice has retried
(n ≥ 2), and falls back to a bare "attempt n" when the budget is unknown.

Denominator confirmed against the net: retryCount halts at `>= maxRetries`
(starts 0), so a slice gets maxRetries + 1 total attempts.

check + presenter tests green (49).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread src/orchestrator/src/cow-copy.ts
Opus 4.8 otherwise splits a single `ask_question` into several parallel
partial calls (one option each, or leaked tool-call XML), which land as
`output-error` parts.

- interview.ts: set `disableParallelToolUse` so the model emits one tool
  call per step.
- app.ts: `stripFailedToolParts` drops echoed `output-error` tool parts
  before `validateUIMessages`. They carry `rawInput` but no `input`, so the
  output-error schema rejects them and bricks the whole follow-up turn; the
  retried call is what matters, the dead attempt carries no history value.
- app.test.ts: covers a follow-up turn whose history contains a failed
  attempt followed by a successful retry.

verify green (check + test + build).

🍳 Built with brunch
Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Comment thread src/orchestrator/src/cook-cli.ts
Comment thread src/orchestrator/src/pi-actions.ts Outdated
Comment thread src/orchestrator/src/project-detect.test.ts
Evaluate gate (phantom failure fix):
- Add `absent` TestFailureKind for vitest "No test files found" (nothing built
  yet), distinct from a genuine `test` red ("Cannot find module" stays `test`).
- runVerification aggregate precedence infra > test > absent.
- evaluate-done no longer emits `failed` for the unbuilt greenfield gate — the
  slice stays running and flows to write-tests as the SAME attempt, killing the
  phantom n/max + ✗ NEEDS WORK on every clean slice.
- Strip `[agent-tail]` harness noise from captured runner output.

Promotion ref namespace + composer:
- run-refs.ts: centralize the run git-ref namespace (brunch/run/<id>,
  brunch/slice/<id>/<sid>), replacing duplicated cook/ + cook-slice/ strings;
  prune stale worktree registrations before `git worktree add`. On-disk
  .brunch/cook/runs/ path unchanged (FE-743 compat).
- run-artifact.ts: fold per-slice branches into the run branch via
  `git merge-tree` plumbing in dependency order — real conflicts surface
  fail-closed, one merge node per slice, `squash` granularity retained. Pure
  plumbing: no working tree / index / active branch touched. Not yet wired into
  cook-cli (needs a live-run check of the dependency-seed interaction).

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
kostandinang and others added 2 commits June 17, 2026 17:29
Cook runs write per-slice worktrees under .brunch/cook/runs/<id>/worktree/,
which Vite's default file watcher picks up — every slice edit triggered a
page reload and tsconfig-change full reload in the host dev server.

- vite-client.ts: server.watch.ignored = ['**/.brunch/**']. Merged with
  Vite's built-in ignores (.git, node_modules, cacheDir), so run artifacts
  no longer drive HMR.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
project-detect.test.ts had the "resolves the runner from workspace
packages in a monorepo" describe block verbatim twice — remove the copy.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f96b365. Configure here.

Comment thread src/orchestrator/src/pi-actions.ts
const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src');
expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']);
});
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated withTestDir test block

Low Severity

The same describe('withTestDir relocates test targets…') suite appears twice in project-profile.test.ts, with identical cases duplicated verbatim starting at line 168.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f96b365. Configure here.

kostandinang and others added 6 commits June 17, 2026 17:41
Greenfield serves accepted --land but never landed: landing only runs in
the brownfield (codebase) promotion path. Greenfield has no repo branch to
land onto, so surface the no-op instead of silently swallowing the flag.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
write-tests and write-code emit a failed slice event before rethrowing on
runPi failure; verify-epic did not, so epic-verification failures left the
slice grid without a failed row or reason. Mirror the wrapper on the epic's
representative slice.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
mergeSlicesIntoEpicSandbox builds __epic__ trees via walkFiles, which skips
symlinks, so the parent's symlinked node_modules never reached the merged
cwd. verify-epic and probes then ran without resolvable deps even though
per-slice verification passed. Re-link the shareable entries after the merge.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
The `withTestDir relocates test targets…` describe block was copied verbatim
a second time. Remove the duplicate.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Cook actions (write-tests/write-code/verify-epic) and the interview already
run on claude-opus-4-8; the architect default lagged at claude-opus-4-6.
Lift it to 4-8 so plan, cook, and interview paths share the latest Opus.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
… detail

Three serve/plan presentation fixes the dogfood run surfaced:

- Recipe banner: the header said "brunch plan" / "✓ plan" while the phase
  tracker right below it reads "recipe". Rename both to "recipe" so the
  planning surface speaks one brigade term (the CLI command stays `plan`;
  this is the phase label).
- out folder: plan-start showed the launch root (e.g. the repo dir). Emit
  the spec plan dir (.brunch/cook/specs/<id>) — the exact place plan.yaml
  lands.
- Verbose warnings: --verbose now appends a one-sentence plain-English
  account after each terse warning code (explainEmitterWarning /
  explainReconciliationWarning), so a reviewer needn't know the code
  vocabulary. Default verbosity keeps the terse line.

Also fix a pre-existing typecheck break in run-artifact.test.ts (mode
'codebase' → 'greenfield'; PlanMode is greenfield|brownfield) that was
reddening the gate.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
@kostandinang kostandinang changed the title FE-864: Orchestrator improvements FE-864: Orchestrator operational improvements (serve reliability + Opus 4.8) Jun 17, 2026
@kostandinang kostandinang changed the title FE-864: Orchestrator operational improvements (serve reliability + Opus 4.8) FE-864: Orchestrator operational improvements Jun 17, 2026
@kostandinang kostandinang changed the title FE-864: Orchestrator operational improvements FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8) Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant